Genome analysis Canonical, stable, general mapping using context schemes

نویسندگان

  • Adam M. Novak
  • Yohei Rosen
  • David Haussler
  • Benedict Paten
چکیده

Motivation: Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general. Results: We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique best mapping, and define this criterion uniformly for all reference bases. Mappings under context schemes can also be made stable, so that extension of the query string (e.g. by increasing read length) will not alter the mapping of previously mapped positions. Context schemes are general in several senses. They natively support the detection of arbitrary complex, novel rearrangements relative to the reference. They can scale over orders of magnitude in query sequence length. Finally, they are trivially extensible to more complex reference structures, such as graphs, that incorporate additional variation. We demonstrate empirically the existence of high-performance context schemes, and present efficient context scheme mapping algorithms. Availability and implementation: The software test framework created for this study is available from https://registry.hub.docker.com/u/adamnovak/sequence-graphs/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Canonical, stable, general mapping using context schemes

MOTIVATION Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general. RESULTS We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique b...

متن کامل

An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping

In this paper we develop an efficient optimization algorithm for solving canonical correlation analysis (CCA) with complex structured-sparsity-inducing penalties, including overlapping-group-lasso penalty and network-based fusion penalty. We apply the proposed algorithm to an important genome-wide association study problem, eQTL mapping. We show that, with the efficient optimization algorithm, ...

متن کامل

Crab biodiversity under different management schemes of mangrove ecosystems

Reforestation is one of the Philippines’ government efforts to restore and rehabilitate degraded mangrove ecosystems. Although there is recovery of the ecosystem in terms of vegetation, the recovery of closely-linked faunal species in terms of community structure is still understudied. This research investigates the community structure of mangrove crabs under two different management schemes: p...

متن کامل

Identification of genomic loci controlling phenologic and morphologic traits in barley (Hordeum vulgare L.) genotypes using association analysis

Association mapping is a technique with high resolution for QTL mapping based on linkage disequilibrium and has shown more promising for describing genetically complex traits. In addition, it is a powerful tool for describing complex agronomic traits and identifying alleles that can contribute to enhance the desired traits. In this study, whole genome association mapping was used in a set of 14...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015